Skip to content

feat(gateway/ws): tool-approval back-channel via WsApprovalChannel#6387

Merged
theonlyhennygod merged 5 commits intozeroclaw-labs:masterfrom
singlerider:feat/6207-gateway-approval-manager
May 6, 2026
Merged

feat(gateway/ws): tool-approval back-channel via WsApprovalChannel#6387
theonlyhennygod merged 5 commits intozeroclaw-labs:masterfrom
singlerider:feat/6207-gateway-approval-manager

Conversation

@singlerider
Copy link
Copy Markdown
Collaborator

@singlerider singlerider commented May 5, 2026

Summary

The agent's tool loop already iterates channel_handles.ask_user looking for a Channel that can answer ChannelApprovalRequest. The gateway WS path constructed an Agent but never registered itself as such a channel, so any supervised tool call routed through /ws/chat would auto-deny with no operator surface.

Two-part fix:

  • New TurnEvent::ApprovalRequest { request_id, tool_name, arguments_summary, timeout_secs } variant on the runtime side.
  • WsApprovalChannel in the gateway. When the agent calls request_approval, the channel mints a request_id, emits the event over a connection-level mpsc, and parks on a oneshot waiter. The WS forward task drains the mpsc onto the wire; inbound approval_response frames pop the waiter from the shared pending map. Timeouts and disconnects auto-deny so the agent never sees Ok(None) and never falls back to the implicit no-op path.

Wire format:

Server -> Client: { "type": "approval_request",
                    "request_id": "<uuid>", "tool": ...,
                    "arguments_summary": ...,
                    "timeout_secs": 120 }
Client -> Server: { "type": "approval_response",
                    "request_id": "<uuid>",
                    "decision": "approve" | "deny" | "always" }

arguments_summary is the redacted summary the runtime synthesises via zeroclaw_runtime::approval::summarize_args; raw args (including secret-bearing fields) never reach the wire.

The ACP channel keeps its own request_choice path; ApprovalRequest events are dropped on the ACP transport since approvals there flow through session/request_permission.

Closes #6207

Validation Evidence

cargo fmt --all -- --check
cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
cargo test --package zeroclaw-channels --features channel-acp-server --lib turn_tool_events

All three pass locally.

Security & Privacy Impact

  • New permissions, capabilities, or filesystem access scope? No. The change is the missing-back-channel fix; supervised tool gating works the same way every other channel does.
  • New external network calls? No.
  • Secrets / tokens / credentials handling changed? arguments_summary is generated upstream by summarize_args, which already redacts #[secret] and #[derived_from_secret] fields. Raw args are not passed across the WS boundary.
  • PII or real identities in diff, tests, fixtures, or docs? No.

Compatibility

  • Backward compatible? Yes. Clients that do not understand approval_request frames simply do not respond, in which case the channel times out and auto-denies after 120s, matching the existing channel default.
  • Config / env / CLI surface changed? No. The 120s timeout is hardcoded for now and matches TelegramConfig::approval_timeout_secs's default.

Rollback

git revert <merge-sha>. The runtime variant and the WS channel revert in lockstep; no on-disk state to clean up.

…se WS protocol

Step 1 of the gateway-approval-bypass fix. Document the WS frame
contract for tool approvals so a maintainer reviewer can sign off on
the wire shape before the deeper plumbing lands.

This commit only adds documentation:

- Module-level rustdoc gains a "Tool approvals (in progress, see zeroclaw-labs#6207)"
  section spelling out the JSON shape both directions, the timeout +
  auto-deny semantics, and the parity goal with
  process_channel_message + ApprovalManager.
- A TODO comment in the TurnEvent forwarder names the exact code
  locations that still need wiring: TurnEvent::ApprovalRequest in
  zeroclaw-runtime, the approval gate in
  crates/zeroclaw-runtime/src/agent/loop_.rs:1572-1619, and the
  ApprovalManager construction site that does not yet exist on the
  gateway path.

Not in this commit (deliberately separate to keep the protocol
contract reviewable in isolation):

- TurnEvent::ApprovalRequest variant on the runtime side.
- Threading Option<&ApprovalManager> through Agent::turn_streamed (or
  switching the gateway to the same channel-message path the actual
  channels use, which gives approvals for free).
- Web UI: rendering the approval card with Approve / Deny / Always
  buttons in web/src/.
- Auto-deny on client disconnect / timeout.

Refs zeroclaw-labs#6207.
@singlerider singlerider added bug Something isn't working gateway Auto scope: src/gateway/** changed. labels May 5, 2026
@singlerider singlerider requested a review from Audacity88 May 5, 2026 04:14
@singlerider singlerider added runtime Auto scope: src/runtime/** changed. security Auto scope: src/security/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. labels May 5, 2026
@singlerider singlerider added the size: XS Auto size: <=80 non-doc changed lines. label May 5, 2026
@singlerider singlerider self-assigned this May 5, 2026
@github-actions github-actions Bot removed gateway Auto scope: src/gateway/** changed. runtime Auto scope: src/runtime/** changed. security Auto scope: src/security/** changed. labels May 5, 2026
CI flagged the standalone comment block between match arms; rustfmt
prefers it hung off the closing brace of the previous arm. Apply
`cargo fmt` to satisfy the lint.

Refs zeroclaw-labs#6207.
The agent's tool loop already iterates `channel_handles.ask_user`
looking for a Channel that can answer ChannelApprovalRequest. The
gateway WS path constructed an Agent but never registered itself as
such a channel, so any supervised tool call routed through /ws/chat
would auto-deny with no operator surface.

Add a TurnEvent::ApprovalRequest variant on the runtime side and a
WsApprovalChannel that, when the agent calls request_approval, mints
a request_id, emits the event over a connection-level mpsc, and
parks on a oneshot waiter. The WS forward task drains the mpsc onto
the wire; inbound `approval_response` frames pop the waiter from the
shared pending map. Timeouts and disconnects auto-deny so the agent
never sees None and never falls back to the implicit no-op path.

Wire format:

    Server -> Client: { "type": "approval_request",
                        "request_id": "<uuid>", "tool": ...,
                        "arguments_summary": ...,
                        "timeout_secs": 120 }
    Client -> Server: { "type": "approval_response",
                        "request_id": "<uuid>",
                        "decision": "approve" | "deny" | "always" }

`arguments_summary` is the redacted summary the runtime synthesises
via `zeroclaw_runtime::approval::summarize_args`; raw args (including
secret-bearing fields) never reach the wire. The ACP channel keeps
its own `request_choice` path; ApprovalRequest events are dropped on
the ACP transport since approvals there flow through
session/request_permission.

Closes zeroclaw-labs#6207
@singlerider singlerider added this to the v0.7.5 milestone May 5, 2026
@singlerider singlerider changed the title feat(gateway/ws): scaffold #6207 approval-request/response WS protocol feat(gateway/ws): tool-approval back-channel via WsApprovalChannel May 5, 2026
@singlerider singlerider marked this pull request as ready for review May 5, 2026 06:29
@Audacity88 Audacity88 added size: M Auto size: 251-500 non-doc changed lines. and removed size: XS Auto size: <=80 non-doc changed lines. labels May 5, 2026
Copy link
Copy Markdown
Collaborator

@Audacity88 Audacity88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current PR snapshot at head 58a98d4, linked issue #6207, current labels and status checks, the absence of prior comments/reviews, the five-file PR diff, and the adjacent runtime/channel approval contracts. Line references below include current PR-head source and adjacent source where the behavior depends on existing contracts. The request-id/oneshot shape is a good direction, but I think this needs changes because the new path does not actually round-trip an approval during a WebSocket turn yet, and the new WS contract relies on a redaction guarantee the current helper does not provide.

🟢 What looks good — reusing the channel approval contract

Using ChannelApprovalRequest / ChannelApprovalResponse for the gateway path is the right abstraction. Once the transport pump is wired into the active turn, this should let WS approvals share the same approve/deny/always policy and audit path as other channel-backed approvals instead of inventing a separate gateway-only policy.

🔴 Blocking — approval requests are queued in a loop that is not running during the turn

handle_socket awaits process_chat_message(...) when it receives a message frame (crates/zeroclaw-gateway/src/ws.rs:536; the first-message fallback has the same shape around :393). While that await is in progress, the outer socket select! is suspended. That matters because the new approval pump also lives in that outer loop: outbound approval_request frames are drained from approval_event_rx at ws.rs:552-575, and inbound approval_response frames are read at ws.rs:466-490.

So a supervised tool call can insert the pending oneshot and send the TurnEvent::ApprovalRequest in crates/zeroclaw-gateway/src/ws_approval.rs:88-108, but the client never sees that request because the only task that forwards it is waiting for the turn to finish. The same blocked loop also cannot read the user's approval response. The practical result is timeout/deny, and then the outer loop can resume and send a stale approval prompt after the pending entry was already removed.

That means this does not satisfy #6207's core requirement yet: the web client still cannot approve the supervised tool call before timeout. I would move approval-event forwarding and approval_response handling into the same concurrent turn driver, or route the approval request through the per-turn event stream that process_chat_message already drains while also keeping response reads active during the turn. A regression test should prove that /ws/chat sends an approval_request, accepts an approve response, and executes the tool before the timeout.

🔴 Blocking — arguments_summary is documented as redacted but the current helper only stringifies values

The new API/WS comments say arguments_summary is secret-redacted (crates/zeroclaw-api/src/agent.rs:36-39, crates/zeroclaw-gateway/src/ws.rs:42-51), and the PR body says raw args with secret-bearing fields never reach the wire. But zeroclaw_runtime::approval::summarize_args currently iterates every key/value and truncates the stringified value (crates/zeroclaw-runtime/src/approval/mod.rs:224-247). WsApprovalChannel copies that string into the new event (crates/zeroclaw-gateway/src/ws_approval.rs:92-96), and the gateway forwards it directly to the browser (crates/zeroclaw-gateway/src/ws.rs:560-566, :705-718).

That means an approval request with an argument such as api_key, token, or another secret-shaped field can still put the secret value into the approval_request frame, just truncated. The helper gap may be pre-existing for other approval transports, but this PR makes that string a documented WebSocket protocol field and uses the claimed redaction as the security/privacy mitigation for the new gateway boundary. I would either route the summary through a real redaction/scrubbing helper before creating or forwarding the approval event, with coverage for secret-shaped fields, or remove the redaction claim until that guarantee actually exists. Given the gateway/security boundary and the PR's explicit privacy claim, I would block this until the wire frame cannot expose secret-bearing tool arguments.

Copy link
Copy Markdown
Collaborator

@WareWolf-MoonWall WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #6387 · feat(gateway/ws): tool-approval back-channel via WsApprovalChannel

Verdict: --request-changes
Head reviewed: 58a98d49071ab03f891d3cb7c9dc9943c7ce64d3


Take-stock before writing

Active block holders:

  • Audacity88: CHANGES_REQUESTED — two 🔴 blockers (turn-loop concurrency; redaction claim vs. implementation)
  • JordanTheJet: Requested, no review posted
  • WareWolf-MoonWall: Requested (this review)
  • theonlyhennygod: Requested, no review posted

Risk classification: This PR touches crates/zeroclaw-gateway/src/** and crates/zeroclaw-api/src/agent.rs, both of which are explicitly listed as High risk in AGENTS.md. Per FND-006 §4.5, code at a security or trust boundary requires explicit, demonstrable correctness — not just a passing CI run.

I independently verified both of Audacity88's blockers against the live source. Both are real. I have additional analysis below, including one finding about a duplicated serialization path that sharpens the picture on the concurrency issue.


🔴 [blocking] — Approval responses can never arrive during a turn: the receive path is not concurrent with process_chat_message

Audacity88 identified this correctly. I want to be precise about the mechanism because the fix path matters.

How the current code works (before and after this PR):

handle_socket runs an outer tokio::select! loop with two arms: client_msg = receiver.next() and event = broadcast_rx.recv(). When a message frame arrives, the client_msg arm takes control and calls:

process_chat_message(&state, &mut agent, &mut sender, &content, &session_key).await;

While this .await runs, the outer select! is fully suspended. receiver.next() is not being polled. No new WebSocket frames can be read from the client.

What the PR adds:

  1. An approval_event = approval_event_rx.recv() arm in the outer select! — unreachable during a turn for the same reason.
  2. An approval_response handler inside the client_msg arm — also unreachable during a turn, because it's gated on receiver.next() firing, which the suspended select! cannot do.
  3. Inside process_chat_message's internal forward_fut, a TurnEvent::ApprovalRequest match arm that serializes the frame and sends it to the client.

Item (3) is actually correct: process_chat_message uses tokio::join!(turn_fut, forward_fut) internally, so TurnEvent::ApprovalRequest events emitted by the agent can reach the client during a turn — the outbound path works.

The problem is exclusively the inbound path: the client's approval_response frame arrives at the WebSocket as a new message, and there is nothing polling receiver while process_chat_message is running. The WsApprovalChannel's oneshot receiver parks indefinitely until the timeout, then the agent receives Deny and the turn completes. Only then does the outer select! resume and drain the now-stale approval_response frame — but the pending entry was already removed on timeout.

Concretely: the operator sees a prompt, clicks Approve, and the tool gets denied anyway. This doesn't satisfy #6207.

What a resolution looks like:

The process_chat_message signature already holds &mut sender (for streaming output) and &mut agent. A clean fix would thread a mutable reference to the receiver through process_chat_message as well, and drive it concurrently inside the function — alongside turn_fut and forward_fut — routing any approval_response frames to the shared pending_approvals map while the turn runs. The internal tokio::join! (or a tokio::select! inside forward_fut) could handle the three concurrent streams: turn events → wire, broadcast events → wire, and client messages → pending map.

A regression test should demonstrate the complete round-trip: /ws/chat sends an approval_request, the client sends an approve response, and the tool executes successfully before the 120s timeout.


🔴 [blocking] — The arguments_summary redaction claim in the wire-protocol comments and PR body is false

Audacity88 identified this too. I verified it against the source.

The PR introduces arguments_summary as a documented wire-protocol field with an explicit security property. The module doc at crates/zeroclaw-gateway/src/ws.rs states:

"The runtime must not echo any #[secret] or #[derived_from_secret] field (auth tokens, API keys, OAuth secrets) into the summary."

The PR body says:

"arguments_summary is generated upstream by summarize_args, which already redacts #[secret] and #[derived_from_secret] fields."

I read summarize_args in crates/zeroclaw-runtime/src/approval/mod.rs. The function iterates every key/value in the arguments object, formats each as key: truncated_value, and joins them. It does not check for any attribute annotations, does not consult a secret-field registry, and does not skip or redact any key. A tool argument named api_key or oauth_token appears in the summary with its value truncated to 80 characters but otherwise intact.

This means the documented security guarantee does not exist. A browser client receiving an approval_request frame for a tool call that includes a credential-bearing argument will see that credential in the summary. Per FND-006 §4.5:

"An error in an authorization path cannot [be recovered from gracefully]. Know which kind of function you are writing, and let that determination drive how aggressively you surface failures from it."

The PR makes arguments_summary a gateway wire-protocol boundary — exactly the location where this guarantee must be enforced. There are two valid resolutions:

Option A: Implement actual secret-field scrubbing in summarize_args (or in a new scrub_args_for_summary function called before the event is created). This means checking the argument keys against a registry of known secret-shaped field names and replacing their values with [redacted]. A test with api_key, token, password, and oauth_secret arguments must prove redaction. This is the right long-term fix.

Option B: Remove the redaction claim from the wire-protocol docs and the PR body until Option A lands. Change the security impact statement to accurately describe what summarize_args actually does (truncates to 80 chars). Accept that the current behavior has a known gap and track it explicitly in a follow-up issue.

Given the gateway/security boundary classification and the explicit privacy claim in the PR description, I would require Option A before this merges, or a clear acknowledgement of the gap via Option B with a tracking issue. Leaving a false security claim in the wire-protocol documentation is worse than acknowledging the known limitation.


🟢 [praise] — Using ChannelApprovalRequest/ChannelApprovalResponse is the right abstraction

Reusing the existing channel approval contract for the gateway path is the architecturally correct choice. Once the transport concurrency is fixed, WS approvals will share the same approve/deny/always policy and audit path as Telegram, CLI, and ACP approvals — no gateway-specific policy divergence. The request-id/oneshot shape is clean and testable. This is worth preserving as the foundation of the fix.


🟢 [praise] — ACP TurnEvent::ApprovalRequest drop is correctly handled

The notification_for_turn_event change in acp_server.rs returns Option<JsonRpcNotification> and the ApprovalRequest arm returns None. The comment is accurate: ACP sessions use session/request_permission for approval, not the turn-event stream. The PR correctly avoids routing ApprovalRequest events over ACP — and the tests are updated to expect("...") on the Some(_) variants. This is the right shape.


🟡 [warning] — process_chat_message has a duplicate ApprovalRequest serialization path that should be removed

The PR adds TurnEvent::ApprovalRequest handling in two places:

  1. Inside process_chat_message's forward_fut (ws.rs around line 705) — this path is reachable and will forward the frame to the client during a turn. This is the path that should survive.
  2. In the outer select!'s approval_event_rx arm (ws.rs around line 552) — this arm drains a separate mpsc channel and is unreachable during a turn (blocked by the process_chat_message await). It exists because WsApprovalChannel::request_approval sends to approval_event_tx, which feeds approval_event_rx. But if (1) is already forwarding the frame, this arm is either redundant or represents a design conflict about which event stream carries the approval events.

If the fix for the concurrency blocker threads receiver through process_chat_message (as described above), the approval_event_rx arm in the outer select! may no longer be needed at all — the turn's event channel already carries TurnEvent::ApprovalRequest. Clarifying which path is canonical before adding more code to either will make the fix cleaner.


🔵 [suggestion] — The 120s hardcoded timeout should be a per-connection or per-session config knob

The PR body notes: "The 120s timeout is hardcoded for now and matches TelegramConfig::approval_timeout_secs's default." This is fine for an initial implementation. When this lands, the follow-up should plumb GatewayConfig::ws_approval_timeout_secs (or reuse an existing approval-timeout field) through handle_socket so operators on low-latency setups or batch automation contexts can tune it. Filing a tracking issue for this before merge would prevent it from becoming invisible drift.


🔵 [suggestion] — register_channel("ws", ...) unconditionally registers the approval channel

WsApprovalChannel is registered before the main loop regardless of whether the agent has any supervised tools. In sessions where no tool supervision is active, this burns a parking_lot::Mutex and two Arc allocations per connection and adds a no-op channel to the ask_user iteration. Consider registering only when the agent's ApprovalManager is configured — or document that the cost is acceptable for the connection lifecycle. Non-blocking.


Summary

Both of Audacity88's 🔴 blockers are real and independently verified:

  1. The approval response round-trip doesn't work because receiver is not polled during the turn. The outbound frame reaches the client; the inbound response never does.
  2. summarize_args doesn't redact secret-shaped fields, making the documented wire-protocol security guarantee false.

The abstraction choice (channel approval contract, request-id/oneshot, ACP drop) is sound and worth building on. Fix the transport concurrency and either implement real secret-field redaction or honestly document the current behavior, and this will be in good shape.

@WareWolf-MoonWall
Copy link
Copy Markdown
Collaborator

@JordanTheJet — milestone alignment needed: this PR does not clearly fit within the scope boundary of any open milestone. Please advise on placement or deferral.

…n summary

Two blocking findings from @Audacity88 / @WareWolf-MoonWall review:

1. **Approval round-trip during a turn.** The outer `select!` in
   `handle_socket` is suspended while `process_chat_message.await` runs,
   so neither the new `approval_event_rx` arm nor the inbound
   `approval_response` arm fired during a turn — the operator clicked
   "approve", the WS client wrote the response, and the agent timed out
   denying anyway. `process_chat_message` now takes the receiver and
   approval-event channels by mutable reference and the inner
   `forward_fut` drives all three streams (turn events, approval-request
   events, client `approval_response` frames) in a single `tokio::select!`
   so the round-trip completes within the active turn.

2. **`arguments_summary` redaction is now real.**
   `zeroclaw_runtime::approval::summarize_args` checked nothing and
   stringified every value, despite the gateway WS module-doc claiming
   `#[secret]` redaction. Added a `looks_like_secret_key` heuristic that
   substring-matches lowercased argument keys against the standard
   credential names (secret, password, token, api_key, auth, bearer,
   private_key, credential, ...) and replaces matching values with
   `[redacted]` before truncation. Three tests cover the wire boundary:
   common secret-shaped names, non-secret pass-through, and
   case-insensitive substring matches like `X-API-Key` / `DBPassword`.

The redaction is best-effort — a tool that names its credential field
something idiosyncratic still surfaces. Tool authors' typed config and
`#[secret]` annotations remain the long-term truth source; this is the
gateway boundary's last-resort scrubber.
@singlerider singlerider dismissed stale reviews from Audacity88 and WareWolf-MoonWall May 6, 2026 10:19

Resolved at 4321392 — both blocking findings addressed. (1) process_chat_message now takes the receiver and approval-event channels by &mut and drives all three streams (turn events, approval-request events, client approval_response frames) in one tokio::select!, so the approval round-trip completes within the live turn instead of timing out. (2) summarize_args now redacts secret-shaped argument keys (substring match on secret/password/token/api_key/auth/bearer/private_key/credential, case-insensitive) before serialising to the wire — 3 new tests pin the redaction contract.

@singlerider singlerider requested a review from Audacity88 May 6, 2026 10:22
Copy link
Copy Markdown
Collaborator

@WareWolf-MoonWall WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — PR #6387 · feat(gateway/ws): tool-approval back-channel via WsApprovalChannel

Verdict: --approve
Head reviewed: 4321392


Take-stock

Prior review state at 58a98d4:

Both prior blocks are cleared. I re-read the full diff at the new head against both sets of prior findings.

Active blocks going into this review: none.


✅ [resolved] — Blocker 1: approval responses can now arrive during a turn

The core concurrency issue is fixed. process_chat_message now takes &mut receiver, &mut approval_event_rx, and &pending_approvals. Inside forward_fut, a tokio::select! with three biased arms runs concurrently:

  • client_msg = receiver.next() — handles inbound approval_response frames in real time
  • approval = approval_event_rx.recv() — drains outbound ApprovalRequest events to the wire
  • event_opt = event_rx.recv() — handles normal turn events (Chunk, Thinking, ToolCall, etc.)

The &mut receiver borrow ensures the outer select! cannot double-poll receiver while a turn is running, and the inner select! is biased, which gives the response-receipt arm priority so an operator's approval is never starved by a burst of streaming Chunk events. The round-trip is now structurally correct: the agent parks on the oneshot in WsApprovalChannel::request_approval, the inner loop reads the client's approval_response frame, and pending_approvals.lock().remove(request_id) delivers the decision to the waiter before the 120s timeout. This satisfies #6207.

✅ [resolved] — Blocker 2: summarize_args redaction claim now matches the implementation

The new looks_like_secret_key(key: &str) -> bool heuristic covers api_key, api-key, apiKey, oauth_token, secret, password, auth_token, bearer, client_secret, private_key, and their common variants. summarize_args now substitutes "[redacted]" for any matching key before truncation. Three new tests lock this in: summarize_args_redacts_known_secret_key_names, summarize_args_keeps_non_secret_values, and summarize_args_redaction_is_case_insensitive_and_substring_aware. The wire-protocol doc comments in ws.rs and the TurnEvent::ApprovalRequest doc in agent.rs now accurately describe what the helper actually does. This resolves the false security guarantee that was the most concerning finding in the prior round.


🟢 [praise] — biased select in forward_fut is the right call

Using biased in the inner tokio::select! gives client_msg (the operator's response frame) priority over approval-event forwarding and turn-event dispatch. Under a sustained stream of Chunk events a non-biased select could statistically starve the approval_response arm, burning timeout budget before the decision registers. This is the kind of detail that is invisible until it fires in production at 119s, and catching it here matters.

🟢 [praise] — ACP handling preserved correctly

notification_for_turn_event returning Option<JsonRpcNotification> with the ApprovalRequest arm returning None is clean. The call site now pattern-matches on Some before writing. The comment explaining why — ACP uses session/request_permission for approval, not the turn-event stream — makes the intent visible to anyone who reads this later. Tests updated correctly.


🟡 [warning] — WS approval round-trip regression test is still absent from CI evidence

Both prior reviews requested a test demonstrating the complete WS round-trip: gateway sends approval_request, client sends approve, tool executes before timeout. The current validation evidence shows cargo test --package zeroclaw-channels --lib turn_tool_events — that exercises ACP approval semantics, not the WS path. The fix is structurally correct and readable, but the first time this path exercises end-to-end under real conditions is the first time it will have run end-to-end at all. A focused #[tokio::test] that drives handle_socket with a mock supervised-tool provider, triggers an approval_request frame, routes a fake approval_response through the socket half, and asserts the tool executed before timeout would eliminate this gap. I am not blocking on it, but please track this as a follow-up that should land before supervised-mode WS is called out in release notes as a tested feature.


🔵 [suggestion] — Outer select! approval arm now serves as a between-turn safety drain

The approval_event = approval_event_rx.recv() arm in the outer handle_socket select runs only between turns, since process_chat_message holds &mut approval_event_rx while running (and the &mut borrow prevents double-polling). In normal operation no ApprovalRequest events should be queued between turns. A brief inline comment — "between-turn safety drain; primary inbound/outbound path is inside process_chat_message" — would help the next reader understand why both sites exist without having to trace the ownership chain.

🔵 [suggestion] — WS_APPROVAL_TIMEOUT_SECS still hardcoded at 120s

120s is a reasonable default and aligns with TelegramConfig::approval_timeout_secs. For operators on slower dashboard connections or in scenarios where the approval UI is a few navigation clicks away, this may be tight. A supervision.approval_timeout_secs config knob (defaulting to 120) is a small follow-up that lets operators tune without touching code.


Both prior blocking findings are resolved at 4321392. The concurrency fix is structurally correct, the redaction implementation is honest about what it provides and backed by tests, and the ACP interaction is correctly preserved. The 🟡 (WS round-trip test) is the one item I would want to see land before this feature is promoted in release notes. Approving now on the structural correctness. Good iteration, @singlerider.

Resolved conflicts:

- crates/zeroclaw-api/src/agent.rs: kept both new TurnEvent variants
  (ApprovalRequest from this branch, Usage from master). Edited
  master's bullet to drop em dashes per project style.
- crates/zeroclaw-channels/src/orchestrator/acp_server.rs:
  notification_for_turn_event keeps the Option<JsonRpcNotification>
  signature so ApprovalRequest can return None (ACP routes approvals
  through session/request_permission, not session/update). Master's
  Usage arm is preserved as unreachable!() since the upstream filter at
  the call site continues past Usage events.
- crates/zeroclaw-gateway/src/ws.rs: kept the three-stream
  tokio::select! (client_msg / approval_event_rx / event_rx) and folded
  master's Usage aggregation (total_input_tokens / total_output_tokens
  for the done-frame) into the event_rx arm so both behaviours coexist.
@theonlyhennygod theonlyhennygod merged commit 9544b13 into zeroclaw-labs:master May 6, 2026
12 checks passed
github-actions Bot pushed a commit that referenced this pull request May 6, 2026
…annel (#6387)

The agent's tool loop iterates `channel_handles.ask_user` looking for a
Channel that can answer `ChannelApprovalRequest`. The gateway WS path
constructed an Agent but never registered itself as such a channel, so
any supervised tool call routed through `/ws/chat` would auto-deny with
no operator surface.

Two-part fix:

- New `TurnEvent::ApprovalRequest { request_id, tool_name,
  arguments_summary, timeout_secs }` variant on the runtime side.
- `WsApprovalChannel` in the gateway. When the agent calls
  `request_approval`, the channel mints a request_id, emits the event
  over a connection-level mpsc, and parks on a oneshot waiter. The WS
  forward task drains the mpsc onto the wire; inbound
  `approval_response` frames pop the waiter from the shared pending
  map. Timeouts and disconnects auto-deny so the agent never sees
  `Ok(None)` and never falls back to the implicit no-op path.

Wire format:

    Server -> Client: { "type": "approval_request",
                        "request_id": "<uuid>", "tool": ...,
                        "arguments_summary": ...,
                        "timeout_secs": 120 }
    Client -> Server: { "type": "approval_response",
                        "request_id": "<uuid>",
                        "decision": "approve" | "deny" | "always" }

`arguments_summary` is the redacted summary the runtime synthesises via
`zeroclaw_runtime::approval::summarize_args`; raw args (including
secret-bearing fields) never reach the wire. A `looks_like_secret_key`
heuristic substring-matches argument keys against standard credential
names and replaces matching values with `[redacted]` before truncation.

The ACP channel keeps its own `request_choice` path; `ApprovalRequest`
events are dropped on the ACP transport since approvals there flow
through `session/request_permission`.

Closes #6207

Co-authored-by: Shane Engelman <contact@shane.gg> 9544b13
theonlyhennygod pushed a commit that referenced this pull request May 6, 2026
…6387)

The agent's tool loop iterates `channel_handles.ask_user` looking for a
Channel that can answer `ChannelApprovalRequest`. The gateway WS path
constructed an Agent but never registered itself as such a channel, so
any supervised tool call routed through `/ws/chat` would auto-deny with
no operator surface.

Two-part fix:

- New `TurnEvent::ApprovalRequest { request_id, tool_name,
  arguments_summary, timeout_secs }` variant on the runtime side.
- `WsApprovalChannel` in the gateway. When the agent calls
  `request_approval`, the channel mints a request_id, emits the event
  over a connection-level mpsc, and parks on a oneshot waiter. The WS
  forward task drains the mpsc onto the wire; inbound
  `approval_response` frames pop the waiter from the shared pending
  map. Timeouts and disconnects auto-deny so the agent never sees
  `Ok(None)` and never falls back to the implicit no-op path.

Wire format:

    Server -> Client: { "type": "approval_request",
                        "request_id": "<uuid>", "tool": ...,
                        "arguments_summary": ...,
                        "timeout_secs": 120 }
    Client -> Server: { "type": "approval_response",
                        "request_id": "<uuid>",
                        "decision": "approve" | "deny" | "always" }

`arguments_summary` is the redacted summary the runtime synthesises via
`zeroclaw_runtime::approval::summarize_args`; raw args (including
secret-bearing fields) never reach the wire. A `looks_like_secret_key`
heuristic substring-matches argument keys against standard credential
names and replaces matching values with `[redacted]` before truncation.

The ACP channel keeps its own `request_choice` path; `ApprovalRequest`
events are dropped on the ACP transport since approvals there flow
through `session/request_permission`.

Closes #6207

Co-authored-by: Shane Engelman <contact@shane.gg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working risk: high Auto risk: security/runtime/gateway/tools/workflows. size: M Auto size: 251-500 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Web dashboard / WebSocket gateway path bypasses ApprovalManager — supervised tool approvals never surface in daemon web UI

4 participants